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Abstract 

We motivate the problem of finding small subgraphs with small bipartiteness (ratio) as a variant 
of detecting small cyber-communities in the Web graph. The bipartiteness ratio of a subgraph S, as 
introduced by Trevisan [Tre09|, roughly measures how close of S being a dense bipartite subgraph. We 
give a bicriteria approximation algorithm SwpDB such that if there exists a subset S of volume at most 
k and bipartiteness ratio 6, then for any < e < 1/2, it finds a set S' of volume at most 2k 1+e and 
bipartiteness at most A^/Oje. 

By combining a truncation operation, we give a local algorithm LocDB, which has asymptotically 
the same approximation guarantee as the algorithm SwpDB on both the volume and bipartiteness of the 
output set, and runs in time 0(e 2 9~ 2 k 1+€ In 3 k), independent of the size of the graph. Our local algo- 
rithm is the first sublinear (in the size of the input graph) time algorithm with almost the same guarantee 
as Trevisan's spectral inequality that relates the bipartiteness of the graph to the largest eigenvalue of the 
(normalized) Laplacian of the graph, and runs in time slightly super linear in the size of the output set. 
Finally, we give a spectral characterization of the small dense bipartite-like subgraphs by using the fcth 
largest eigenvalue of the Laplacian of the graph, which is of independent interest since most of previous 
spectral characterizations of combinatorial objects only use the first k smallest eigenvalues. 

1 Introduction 

Community detection and characterization has stimulated widespread interest in modern network sci- 
ence, which has been a very active research area due to the proliferation of very large social and tech- 
nological networks over the past few years. In the literature of computer science, communities are 
often referred to as locally dense subgraphs in which edges are densely connected with each other while 
loosely connected to the outside of the subgraph. Communities convey valuable information on both 
the structures and dynamics of networks, and have found applications in market advertising, rumor 
spreading, ranking web pages and so on. For more motivations and detection methods, see recent sur- 
veys OSch07IIPOM09llForTOl . 

In this paper, we focus on the problem of searching and characterizing the cyber-communities, which, 
as argued by Kumar et al. | KRRT99 1 , are well characterized by dense bipartite subgraphs due to the par- 
ticular phenomenon of heavy co-citations among related web pages in the Web, that is, related pages 
are frequently referenced together. Here a dense bipartite subgraph refers to a subgraph that is sparsely 
connected to the outside and can be partitioned into two disjoint vertex sets L,R such that many of 
the possible edges between L and R are present. Since the work of Kurmar et al. BKRRT99L practi- 
tioners have proposed a large set of simple and efficient heuristic methods to extract this kind of sub- 
graphs (eg., [KMS04 DGP09]). These heuristics are often case-by-case and experimental. On the 
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other hand, to our knowledge, theoreticians have only studied the extreme cases of the dense bipartite 
subgraphs, eg., the maximum edge bicliques |Pee03|, which are far from being close to the true cyber- 
communities detected in the Web; there is no algorithm with provable approximation or running time 
guarantees for a more proper measure of a set being a dense bipartite subgraph. 

Let us elaborate more on how to measure a dense bipartite-like subgraph. Let G — (V, E) be an 
undirected graph representing a Web graph, in which an edge between two nodes indicates the existence 
of a hyperlink between the corresponding two web pages. (We ignore the direction of the links.) As 
stated above, a dense bipartite-like subgraph is a pair of disjoint vertex subsets L, R such that 'most' of 
the edges involving the vertices in U := L U R lie between L and R. Equivalently, we say that L, R 
form a dense bipartite subgraph if 'few' edges lie totally in L or R, or leaving U to the rest of the graph. 
The latter formulation turns out to be well captured by the bipartiteness ratio (shorted as bipartiteness) 
measure of L,R, which was introduced by Trevisan with a totally different motivation to serve as a 
subroutine for designing approximation algorithms for Max Cut problem HTre09H . The bipartiteness of 
L,Ris defined as 

R(T m 2e(L) + 2e(R) + e(U,U) 
KL ' R) = v^ ' 

where e(L), e(U, U) denote the number of edges in L and the number of edges leaving from U to the 
rest of the graph, respectively; and vol([7), called the volume of U, is defined to be the sum of degree 
of vertices in U. Notice that the numerator involves all the edges that are not between L and R, and the 
dominator involves all the edges incident to L U R. It is intuitive that the smaller the bipartiteness, the 
more likely it behaves like a dense bipartite subgraph. 

Thus, we will use the bipartiteness as a measure of a set being dense bipartite-like. We want to extract 
subgraphs with small bipartiteness, which corresponds to good cyber-communities. Furthermore, we are 
interested in finding small communities, which generally contains more interesting and substantial in- 
formation than large communities partly due to the hierarchical organization of the community structure 
in networks, that is, large communities are usually consisted of small ones. Furthermore, Leskovec et 
al |LLDM09, LLM10] find that in many large scale networks, the sets which mostly resemble commu- 
nities are of size around 100, which is rather small compared to the size of the network. There is also 
experimental evidence and common experience that a significant fraction of nodes in networks belong to 
some small communities, which is mathematically characterized as the small community phenomenon in 
networks ILPllllLP12l . 

In order to make our algorithm practical, we would like to design a local algorithm to extract sub- 
graphs with small bipartiteness. A local algorithm, introduced by Spielman and Teng [ST04], is one that 
given as input a vertex, it only explores a small portion of the graph and finds a subgraph with good 
property, which has found applications in graph sparsificasion, solving linear equations |SpilO|, and de- 
signing near-linear time algorithms [Tenl0|. Local algorithms have also shown to be both effective and 
efficient on real network data (e.g, I LLDM091 ILLB+ 09l ) . 



1.1 Our Results 

We give approximation, local algorithms and spectral characterization of the finding the small subgraphs 
with small bipartiteness, as we argued above, with the goal of extracting small cyber-communities. In 
the following, we will use the terminology of small dense bipartite-like subgraphs to indicate small 
subgraphs with small bipartiteness. 

• We first give a bicriteria approximation algorithm for finding the small dense bipartite-like sub- 
graph, and thus determining the dense bipartite profile of the graph, which is defined as 

B(k) := min 0(L,R). 

L,R:LDR=$ 
vol{LUR)<k 
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More specifically, we give a polynomial time algorithm SwpDB such that for any < e < 1/2, if 
the graph contains a subgraph S with volume at most k and bipartiteness 9, then it finds a subgraph 
X with volume at most 2k 1+e and bipartiteness at most iW6~Je. 




Note that the approximation ratio does not depend on the size of the graph, since the algorithm is 
based on a spectral characterization of the bipartiteness of the graph given by Trevisan [Tre09| (see 
Lemma[T|i, which is analogous to the Cheeger's inequality for conductance (see more discussions 
below). 

• By incorporating a truncation operation we are able to give a local algorithm for the dense bipartite 
subgraphs. We show that if the graph contains a subgraph S with volume at most k and bipartite- 
ness at most 8, then there exists a subgraph Sg C S with volume at least vol(S')/2 such that if 
our local algorithm LocDB takes as input a vertex v £ Se, then for any < e < 1/2, it finds a 
subgraph X with volume at most 0(k 1+e ) and 0(-J 6/e), with running time 0(e 2 9~ 2 k 1+e In k 3 ), 
independent of the size of the graph. We remark that the algorithm runs in sublinear time (in the 
size of the input graph, denoted as n) when the size of the optimal set is sufficiently smaller than 
n and the approximation ratio of the algorithm is almost optimal in that it almost matches the 
guarantee of Trevisan's spectral inequality for the bipartiteness. 

• Finally, as an application of the algorithm SwpDB, we give a spectral characterization of the small 
dense bipartite subgraph. Let Ao < Ai < • ■ ■ < A n _i be the eigenvalues of the Laplacian matrix C 
of the graph G. We showed that if \ n -k > 2 — 2r/, then there is a polynomial time algorithm such 
that for any Q < e < 1, it finds a subgraph with volume at most 0(vol(G) /k 1 ~ e ) and bipartiteness 
at most 0(\J {rj/e) \og k n), where vol(G) is the total degree of vertices in G. One can interpret the 



Note that we related the kth largest eigenvalue of C with some combinatorial object (in this case, 
the small dense bipartite subgraph), which is of independent interest as previous works mostly 
just use the first k' smallest eigenvalue to characterize some combinatorial objects (e.g., small set 
expander) in graphs (see more discusses below). 

1.2 Our Techniques 

Our approximation algorithm is based on Trevisan's spectral characterization of the bipartiteness /3(G) 
of the graph, which is the minimum bipartiteness of all possible disjoint vertex subsets L,R, that is, 
/3(G) = /3(vol(G)). Recall that Ao < Ai < • • ■ < A„_i are the eigenvalues of C. Instead of working 
directly on C, we study a closely related matrix M, which we call the quasi-Laplacian, that has the same 
spectra as C. Let Vo, Vj., • • • , \ n -i be the corresponding eigenvectors of M. Trevisan showed that if 
A,i-i > 2 — 29, then by a simple sweeping process over the largest eigenvector v„_i, we can find a 
pair of subsets X, Y with bipartiteness at most 2\f~9. On the other hand, it is well known that the largest 
eigenvector v„_i can be computed fast by the power method, which starts with a "good" vector q and 
iteratively multiplies it by M to obtain q t , and outputs q T by choosing proper T. Hence, the power 
method combined with the sweep process can find a subset with bipartiteness close to /3(G). However, 
such a method does not give a useful volume bound on the output set. 

In order to find small dense bipartite subgraphs, we sweep each of the vector q t and characterize 
q t in terms of the minimum of bipartiteness of all the small sweep sets (the sets found in the sweeping 
process) encountered in all the T iterations. This is done by a potential function J(p, x), which has 
a nice convergence property that for general vector p and some x, J(pAf , x) can be bounded by a 
function of J(p,x') and the bipartiteness of the some sweep set (see Lemma |2j. Using this property, 
we show inductively that if we choose q = \v for some vertex v 6 V, J(q t , x) can be upper bounded 
by a function in t, K and the minimum bipartiteness of all the sweep sets of volume at most K for 
all t < T (see Lemma |3). On the other hand, if the graph contains a small dense bipartite subgraph 
L, R of volume at most k, we prove that the potential function also increases quickly in terms of t and 
/3(L, R) (see Lemma|4]i, which will lead to the conclusion that at least one of the sweep set with volume 




result as 



/3(vol(G)/fc 1 - £ ) < 0(V(2-A ri _ fc )log, n). 
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at most K has bipartiteness "close" to /3(L,R) by choosing proper K in terms of k and the starting 
vertex v. 

To give local algorithms that run in time independent of the size of the graph, we need to keep the 
support size of the vectors q t small in each iteration. This is done by a truncation operation of a vector 
that only keeps the elements with large absolute vector value. Let q = \v and iteratively define q t to 
be the truncation vector of q t _ 1 M. We show that both upper bound and lower bound on J(q t , x) still 
approximately holds for J(q t , x), and thus prove the correctness of our local algorithm which sweeps all 
the vectors q t instead of q t . 

Finally, we use a simple trace lower bound to serve as the lower bound for J(q t , x) and obtain the 
spectral characterization of the dense bipartite profile. 



1.3 Related Works 

Our work is closely related to a line of research on the conductance of a set S, which is defined as 

J.(Q\ _ e (^' jjj 

9[ ' ~ min{vol(S),vol(S)}' 

Kannan, Vempala and Veta BKVV04 I suggest using the conductance as a measure of a set being a general 
community (in contrast of cyber-communities), since the smaller the conductance it, the more likely that 
the set is a community with dense intra-connections and sparse inter-connections. Spielman and Teng 
give the first local clustering algorithm to find subgraphs with small conductance by using the truncated 
random walk OST04||ST08l . Anderson, Chung and Lang HACL061 , Anderson and Peres I1AP091 , Kwok 
and Lau 1KL121 1 and Oveis Gharan and Trevisan HOT 121 then give local algorithms for conductance with 
better approximation ratio or running time. All their local algorithms are based on the Cheeger's inequal- 
ity that relates the second smallest eigenvalue of C to the conductance [AM85, AI086 SJ89], similar to 
our algorithms which depend on Trevisan's spectral inequality that relates the largest eigenvalue of C to 
the bipartiteness. 

Some works studied the small set expander graph, that is, to find small set with small conduc- 
tance. This problem is of interest not only for the reason that it has applications in finding small 
communities, but also that it is closely related to the unique games conjecture [RS10|. Arora, Barak 
and Steurer [ABS10], Louis, Raghavendra, Tetali and Vempala MLRTV12L Lee, Oveis Gharan and 
Tre vis an IILOT 121 . Kwok and Lau 1KL12L Oveis Gharan and Trevisan IIOT121 and O'Donnell and Wit- 
mer BOW 121 have given spectra based approximation algorithms and characterizations of this problem. 
The latter three works have recently shown that for any < e < 1, 



^(voKG)/^ 1 -') < 0( v / A fc log fc n), 
where 4>{k) is the expansion profile of G and is defined as 

6(k) :— min MS). 

S:vol(S)<fc 

Their spectral characterization of the expansion profile as well as the Cheeger's inequality all use the first 
k smallest eigenvalues of C, which is comparable to our characterization of the dense bipartite profile by 
the fcth largest eigenvalue of C. 

Peng |Penl2| has given a local algorithm for the dense bipartite subgraphs. His algorithm is guar- 
anteed to output a set with volume at most 0(k 2 ) and bipartiteness 0{\fd), which is worse than the 
approximation guarantee in our local algorithm when e < 1/2 is a constant. 



2 Preliminaries 

Let G = (V, E) be an undirected weighted graph and let n := \V\ and m := \E\. Let d(v) denote the 
weighted degree of vertex v. For any vertex subset S C V, let S := V\S denote the complementary of 
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S. Let e(S) be the number of edges in S and define the volume of S to be the sum of degree of vertices 
in S, that is vol(S) := J2ves ^( v )- Let V °1(G9 : = vol(V) = 2m. For any two subsets L,R C V, let 
e(L,R) denote the number of edges between L and i?. For two disjoint subsets L, R, that is, L n R = 0, 
we will use U = (L, R) to denote subgraph induced on L and i?, which is also called the pair subgraph. 
We will also use U to denote L U R. Given U = (L, R), the bipartiteness (ratio) ofU is defined as 

2e(£) + 2e(iZ) + e(Cf,fr) 

vol(co • 

The bipartiteness of a set S is defined to be the minimum value of j3(L,R) over all the possible 
partitions L, R of S, that is, 

/3{S) := min /9 (£,#). 

partition of S 

The bipartiteness of the graph G is defined as 

/3(G) : =/3(^) = min/3(S). 

We are interested in finding small subgraphs with small bipartiteness. In the following, we use 
lower bold letters to denote vectors. Unless otherwise specified, a vector p is considered to be a row 
vector, and p T is its transpose. For a vector p on vertices, let supp(p) denote the support of p, that is, 
the set of vertices on which the p value is nonzero. Let ||p||i and ||p||2 denote the L 1 and L 2 norm 
of p, respectively. Let |p| denotes its absolute vector, that is, |p|(w) := |p(f)|- For a vector p and 
a vertex subset S, let p(5) := E ue sP<»- For L > R < let P( X > ~ R ) : = T,veLP( v ) ~ E u g_rP( w )- 
One useful observation is that for any partition (L, R) of S, p(£, — R) < |p|(<5). Also note that there 
exists a partition (Lq, Rq) of S such that p(Lo, —Rq) = |p| (S"). Actually, Lq is the set of vertices with 
positive p value and Rq is the set of the remaining vertices, that is, Lq = {v E S : p(i>) > 0} and 
Ro = {v € S : p(v) < 0}. 

For any vertex v, let Xv denote the indicator vector on v. Let 1 denote the all 1 vector. For a set 
U = (L, R), define pjj and ipu as 

r d(v)/vol(U) ifveL, ( y/d {v)/\o\(U) ifveL, 

pu(v) = l -d{v)/vo\{U) ifveR, ipu{v)=< -y/d(v)/\ol(U) if v e R, 

[ otherwise. [ otherwise. 

Now let A denote the adjacency matrix of the graph such that A uv is the weight of edge u ~ v. Let 
D denote the diagonal degree matrix. Define the random walk matrix W, the (normalized) Laplacian 
matrix C and the quasi-Laplacian matrix M of the graph G as 

W := D~ X A, C:=I - D~ 1/2 AD~ 1/2 , M := I - D^A. 

It is well known that these three matrices are closely related. In particular, if we will let Ao < Ai < 
■ • • < A n _i be the eigenvalues of C, then {1 — Aj}o<i<n-i and {A<j}o<i<n-i are the eigenvalues of W 
and M, respectively. In this paper, we will mainly use the quasi-Laplacian M to give both algorithms 
and spectral characterization for the small dense bipartite subgraph problem. If we let Vo, Vi, ■ ■ ■ , V„_i 
be the corresponding eigenvectors of M, then we have the following spectral inequality given by Tre- 
visan [Tre09] (see also |Penl2l). 

Lemma 1 (|Tre09|). Let /3(G), A„_i andv n -\ defined as above. We have that, 



/3(G) < \/2(2 - A). (1) 



Furthermore, a pair subgraph (X, Y) with bipartiteness y2(2 — A) can be found by a sweeping process 
over Vn-h 
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The sweeping process mentioned above is defined as follows. 



Definition 1. (Sweep process) Given a vector p, the sweep (process) overp is defined by performing the 
following operations: 



• Order the vertices so that 



|p("i)l > IpfoOl > \p(v n 



d(vi) d{v 2 ) ' d(v n ) 



• For each i < n, let Li(p) := {vj : p(vj) > Oandj < i}, Ri(p) '■= {vj : p{vj) < Oandj < i} 
and Si(p) := (Li(p), Ri(p)), which we call the sweep set of the first i vertices. Compute the 
bipartiteness of Si(p). 

In Trevisan's inequality, to find the subgraph with small bipartiteness, we just need to output the 
sweep set with the minimum bipartiteness over the all the sweep sets. Trevisan also showed the tightness 
(within constant factors) of inequality ([TJ in the sense that there exist graphs such that the two quantities 
in both hands of the inequality are asymptotically the same. The sweeping process as well as Trevisan's 
inequality are the bases of our algorithms for the small dense bipartite-like subgraphs. 

We will use the following truncation operator to design local algorithms. 

Definition 2. (Truncation operator) Given a vector p and a nonnegative real number £, we define the 
^-truncated vector ofp to be: 



p(u) if\p(u)\>td(u) 
otherwise 



The following facts are straightforward. 

Fact 1. For any vector p and < £ < 1, 

1- |[p]f| < |p| < lipid + £A, where d is the degree vector. 

2. vol{supp{]p]t:)) = E 06 «w(M«) d ^ - ^v£ S u PP (]p] ( ) PO)l/£ < IHi/6 



3 Approximation Algorithm for the Small Dense Bipartite-like Sub- 
graphs 

In this section, we first give the description of our approximation algorithm for the small dense bipartite- 
like subgraph, the main subroutine of which is the sweeping process over a set of vectors XvM*. We 
then introduce a potential function J(p, x) and give both upper bound and lower bound of the potential 
function J(x v M t ) under certain conditions, using which we are able to show the correctness of our 
algorithm. 



3.1 Description of the Algorithm and the Main Theorem 

Now we describe our algorithm SwpDB (short for "sweep for dense bipartite") for finding the small 
dense bipartite-like subgraphs. 

SwpDB(fc, 9, e) 

Input: A target volume k, a target bipartiteness 9, an error parameter e < 1/2. 
Output: A subgraph (X, Y). 

1. Let T = L ^ i , where c is some constant such that c~ e - c _1 > 1/2. Let K = 2k 1+e . 

2. Sweep over all vectors XvM*, for each vertex v £ V and t < T, to obtain a family T of 
sweep sets with volume at most K. 

3. Output the subgraph (X, Y) with the smallest bipartiteness ratio among all sets in T . 
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Our main theorem of this algorithm is as follows. 

Theorem 1. Assume that G has a set U — (L, R) such that fi{L, R) < 8 and vol(U) < k, then for any 
< e < 1/2, the algorithm SwpDB(fc, 9, e) runs in polynomial time and finds a set (X, Y) such that 
vol{X U Y) < 2k 1+e , and f3{X, Y) < A^/dfe. 

3.2 A Potential Function 

We define a potential function J : [0, 2m] — > R + : 

J(p, x) := max > |p(u)|w(«). 

W6[0,l]" ^ 

Note that our potential function is similar to a potential function for bounding the convergence of 
P( I+ 2 V ) t m terms of the conductance given by Lovasz and Simonovits HLS90I ILS93L Here we will 
use J(p, x) to bound the convergence of qM * in terms of the bipartiteness of the sweep sets. 
There are two useful ways to see this potential function: 

• We view each edge u ~ v € E as two directed edges u — > v and v — > u. For each directed edge 
e = u —t v, let p(e) = §^y. Order the edges so that 

|p(ei)| > |p(e 2 )| - - - > |p(e 2m )| 

Now we can see that for an integer x, J(p, x) — J2j=i |p( e i)l- F° r °th er fractional x = [x\ + r, 
J(p,a;) = (l-r)J(p J La:J) + rJ(p j ra:l). 

Also it is easy to see that for any directed edge set F, \p\{F) :— J2 e eF IPl( e ) — ^(P; l-^D' smce 
the former is a sum of |p| values of one specific set of edges with |F| edges and the latter is the 
maximum over all such possible edge sets. 

• Another way to view the potential function is to use the sweep process over p as in DefinitionQ] 
By the definitions of the potential function and the sweep process, we have the following observa- 
tions. 

1. Forx = vol($(p)), then J(p,z) = £} =1 |p(^-)| = |p|(S(p)) = p(£<(p),--R<(p)). And 
J(p, x) is linear in other values of x. 

2. For any set S, |p| (5") < J(p,vol(5)), since the former is the sum of \ip(v)\/d{v) values of 
vertices in S and the latter is the maximum sum over all sets with \S\ vertices; 

From both views, we can easily see that the potential function is a non-decreasing and concave 
function of x. 

3.3 An Upper Bound for the Potential Function 

Now we upper bound J(pM, x) in terms of J(p, x') and the bipartiteness of the sweep set of pM. 

Lemma 2 (Convergence Lemma). For an arbitrary vector p on vertices, if/3(Li(p), Ri(p)) > Q, then 
forx = vol(Si(p)), 

J(pM, x) < J(p, x + Ox) + J(p, x - Ox) 
Proof. We show that for any U = (L, R), we have that 

pM(L, -R) < J(p, vol(E/)(l + P{L, R))) + J(p, vol(C/)(l - 0(L, R))) (2) 
Then the lemma follows by letting U = S'i(p) = (ij(p), Ri(p)) and that 

J(pM,x) = vM(L t to),-Rib)) 

< J(p, x(l + P(Li(p), -Ri(p)))) + J(p, x(l - /3(Li(p), -Ri(fi)))) 

< J(p,x(l + e)) + J(p, a; (l-e)) ! 
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where the last inequality follows from the concavity of J(p, x). 

Now we show inequality @. Let L\ — > L2 denote the set of direct edges from L\ to L2 for two 
arbitrary vertex sets L\ and L2. We have that 

pM{L,-R) = p{I - D^A^L.-R) 

= p(L) -p(R) -pD~ l A{L) + pD~ 1 A{R) 

\ - \ - P(v) \ - \ - P(v) \ - \ - P_H \ - \ - P(u) 
^ ^ d(v) ^ ^ d(v) ^ ^ d(u) ^ ^ d(u) 

= E p( e )- E p( e )~ E p( e )- E p( e ) 

eeL^L eeR^R eeR^L e ^u^L 

+ E p( e )+ E p( e ) 

eEL^R. e< zJJ^R 

< E ip( e )i+ E IpWI 

ee(L^L)U(R^R)U(U^U) ee(L->fl)U(\R-K£,) 

< J(p, 2e(L, R) + 2e{U, U)) + J(p, 2e(L, R)) 

< J{p,vol(U)+2e(L) + 2e(R) + e(U,U)) 
+ J(p, \ol(U) - 2e(L) - 2e{R) - e(C/, {/)) 

where the second to last inequality follows from the fact that \(L —> L) U (R —> R) U (U — > U)\ — 
2e(L,R) + 2e(U,U), that \(L ->• R) U (R -> L)\ = 2e(L,R) and that |p|(A) < J(p,\F\) for an 
arbitrary (directed) edge set F; and the last inequality follows from that J(p, a;) is non-decreasing. □ 

Now we can use the convergence lemma to upper bound J(x„M*, x). 

Lemma 3. For any vertex v € V, let q t — x„M 4 , if for all t < T and all sweep sets Si(q t ) = 
(Li(q t ), Ri(q t )) of volume at most K have bipartiteness at least 9, that is, j3(Li(q t ), Ri(q t )) > 8, then 
for any t < T, 

Proof. The proof is by induction and is similar to the Lemma 4.2 in flOT121 . 

If t = 0, then the LHS is x/d(v) for x < d(v) and is 1 for x > d(v), and the RHS is at least 
^Jx/d{v) for any x € [0, 2m]. Thus, the lemma holds in this case. 

Assume the lemma holds for t — 1. Since J(qt, x) is piecewise linear in x, and the RHS is concave, 
we only need to show the lemma holds for x = vol(Si(qt)) for any i < n. 

• For x > K, the RHS is at least 2'. On the other hand, for any vector p, we have 
J(pM,2m) = ||pM||i = EEp( v ) M -I ^ E IP(")I El M -l ^ ^IPlI 1 = 2J (P' 2m )' 

U V V u 

Therefore, 

J(q t , x) < J(q t , 2m) < 2J(q t _ 1 , 2m) < ■ ■ ■ < 2* J(q , 2m) = 2* 

So the lemma holds for a; in this case. 

• For x < K, recall that x = vol(Si(qt)), by Lemma|2]and the induction hypothesis, we have 

J(q t > x ) < J(q t _ x , x + xG) + J(q t _ x , i - x0) 



A" 



2*x n^(r, ©v 

- ~a" + V Kvj v ~ T 
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where the last inequality follows from that 

, , e 2 

VTTQ + VT^e < 2 . 

4 

This completes the proof. □ 

3.4 A Lower Bound for the Potential Function 

We show that if the graph contains a pair subgraph with small bipartiteness, then we can have a good 
lower bound on J(x v M t ) for some vertex v. The following lemma is similar to the upper bounds on the 
escaping probability of random walks given by Oveis Gharan and Trevisan MOT 1 211 . 

Lemma 4. IfU= (L, R) has bipartiteness /3(L, R) < 9, then for any integer t > 0, 

1. there exists a vertex v G U such that \q t \(U) > (2 — 20)', where q t = XvM 1 ; 

2. there exists a subset U C U withvo^U 1 ) > vol(U) /2 satisfying that for any v £ U l ,q t = XvM 1 , 

J( ?t ,wZ(L0)>|f t |(E0> 4^(2-60)*, 

where we have assumed that 9 < 1/3. 
Proof. 1 . For the first part, we will show that 

puM^L, -R) > (2 - 20)*. (3) 

If it holds, then by the fact that puM\L, -R) = 2~2 veU T^) s § n ( w ' L )XvM*(L, -R), where 
sgn(u, L) equals 1 if v G L and — 1 if v G R, we know there exists a vertex v £ U satisfying 
sgn(u,L)x,;M*(i,-i?) > (2 - 20)*. Then the lemma follows from the fact that \p\(U) > 
max{p(L, — R), p(i?, — L)} for any p. 
To show inequality (01, we note that for any t > 0, 

Pu M\L, -R) = puD^^D^iL, -R) = ^C*^. 

On the other hand, 

i> u (2-£)^=^ u D- 1 / 2 (D + A)D- 1 / 2 ^ = Y,^u(u)/Vd^)~^u(v)/Vd&)) 2 

4e(L)+4e(R) + e(U,U) ^ nD 

= ™m - 2e > 

which implies that 

ifo£il% > 2 ~ 29. (4) 

Now recall that = Ao < Ai < • • • < A„_i < 2 are the eigenvalues of the Laplacian C. Let 
Vq, y'x, ■ ■ ■ , y'n-i be the corresponding orthonormal eigenvectors of C. If we write ipjj = J^. a^, 
then by inequality @, we have « K a i > 2 — 26. Therefore, 

ihC*1% = A*« 2 > (J2 ^ ( 2 - 2 ^> 

i i 

where the second inequality follows from the fact that £\ a? = HV'c/Hl = 1 an d tne Chebyshev's 
sum inequality. 
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voi(Z) > 



2. For the second part, we show that for any set Z = {Lz 1 Rz) such that Lz C L, Rz C i? and 

>1(L 

2 

p z M*(I, z ,-i^)> ^L(2-60)', (5) 



from which we know there exists at least one vertex v in Z such that 

IXvM'KU) > \ Xv M t \(Z)> S gn(v,L z ) Xv M t (L z ,-R z ) > J_(2-60)*. 

Then by the choice of Z, we know that the set {/' := : |x„M*|(?7) > -L(2 -60)*} has volume 
at least vol(£/)/2 and the lemma's statement holds. 

On the other hand, we have that pzM t (Lz, —Rz) = ipzM 1 ^ for the same reason as in the first 
part of the proof, so we only need to show that 

4>zM^ T z >-^{2-Wy. 



Let H — {i : X t > 2 — 69}. For an vector p, define its iJ-norm as ||p||ij := \/2ieij (Pj y i) 2 - ^ 
is straightforward to show that ||-|| jj is a seminorm. Recall that ipu — J2i a i v 'i an ^ J2i ^i a l — 
2 — 26. By the definition of i/-norm and that || ipjj || 2 = 1, we have 

]T Kaj < 2 «? + (2 - 60) £ a 2 = 2||^|& + (2 - 60)(1 - ||^|| 2 ff ), 
which gives that 

Uu\?h > 2/3. 
Now we write = X)i A v i- ^ i s eas y t0 sriow that 



, vol(Z) V vol(C/)/ ^ vol(C7) 

»ez v v ; V y ' veu\z y ' 

/ 1 2 __L \ \o\(U\Z) 

^ {V) \wol(Z) v /vol(Z)vol(C7) + vol(U)) + vol(J7) 

= 2-2, 



'vol(Z) 



vol(C7) 

< 2-\/2, 

where the last inequality follows from our assumption that vol(Z) > vol(!7) /2. 
Hence, 

HV'c/ - lM» < HV'cr - V>z|| 2 < \J2-V2. 
Then by the triangle inequality, we have 

Hz\\h > Hu\\h -Mu-1>z\\h>JI- V2-V2 > 1. 

Finally, we have 

^rVz = £ A*ft 2 > (2 - 6(9)* l^zlll, > -L(2 - 60) 2 . 



□ 
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3.5 Proof of Theorem Q] 

Now we are ready to prove TheoremQ] 

Proof. Clearly the algorithm SwpDB runs in polynomial time. Now we show the correctness of the 
algorithm. Let = iWO/e. Assume on the contrary that the algorithm SwpDB(fc, 9, e) does not find a 
desired subgraph, and thus for any v £ V, and t < T = e h ^g k , the sweep sets Si(x v M t ) of volume at 
most K = 2k l+t have bipartiteness at least Ay/9/e. Then by Lemma|3] for any v eV, 



4 ) \2k 

< 2 T ( C fcr e , 



2 T 



where the last inequality follows from the fact that e < 1/2 and that c _e > c _1 + 1/2. 

On the other hand, since U = (L, R) is subgraph such that /3(L, R) < 9 and vol(J7) < k, then by 
Lemma |U we know that there exists a vertex u € U such that, 

J(XuM T , k) > (2 - 29) T > 2 T (1 -0)^> 2 T (ck)~ € , 

which is a contradiction. □ 



4 A Local Algorithm for Dense Bipartite-like Subgraphs 

We will use the truncated operation to give our local algorithm LocDB (short for "local algorithm for 
dense bipartite subgraph"). Note that in the algorithm we just sweep the support of a given vector, which 
is important for the computation to be local. 

LocDB(w, k, 9, e) 

Input: A vertex v, a target volume k, a target bipartiteness 9 < 1/3 and an error parameter 
e < 1/2. 

Output: A subgraph (X, Y). 

1. Let T = el gg° fc , where c is some constant such that c^ e > 800c 1 + 1. Let £ = 

C " 8 oot ' & = &2*. Let q := X v, r := [q ]f - Let J 7 = ®. 

2. For each time 1 < t < T: 

(a) Compute^ := r t -\M, r t := [q t ] £{ ; 

(b) Sweep over the support of q ( and add to T all the sweep sets. 

3. Output the subgraph (X, Y) with the smallest bipartiteness ratio among all sets in T . 



Theorem 2. If there is a subset U = (L, R) of volume vol(U) < k and bipartiteness /3(L, R) < 9 < 
1/3, then there exists a subgraph U$ C U satisfying that vol(Ug) > vol(S)/2 and that if v € Ug, then 
for any < e < 1/2, f/ie algorithm LocDB(u, fc, 0, e) finds a subgraph (X, Y) of volume 0(k 1+e ) and 
bipartiteness 0(y/6/e). Furthermore, the running time of LocDB is 0(e 2 9~ 2 k 1+e In 3 k). 

To prove the theorem, we will use the upper bound and lower bound of the potential function J(q t , x) 
given in Section[3] However, to show the correctness of the local algorithm, we need to work on J(q t , x) 
instead, which can be bound by combining the following properties of the truncation operations in the 
algorithm. 

Proposition 1. For any vertex v, if q t = XvM and q tl r t are as defined in the algorithm LocDB, then 
for any t > 0, 
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i- \\9 t \\x<^; 

2. \r t — q t | < £ot2 t d, where d is the degree vector. 

Proof. We prove both the inequalities by induction. 

1 . If t = 0, the inequality trivially holds since q 
t - 1. Then 



= Xv- Now assume that the inequality holds for 



llQji = Ikt-iMHi = Wfot.^MWt < ||[q t _ 1 ] et _ 1 ||i*2 < 2||q t _ 1 || 1 < 2\ 

where the third inequality follows by the fact that ||pM|| i < 2||p|| i for all p; the fourth inequality 
follows by the definition of truncation; and the last inequality follows by the induction. 
2. If t = 0, the inequality holds since q = r = [q ]f = Xv If * = 1, then I - ! = [qj^j = 
[roAf]^! = [qoA/J^j = [qj]^, and thus \r ± — q 1 | < £ x d = 2£ d by the FactQ] Now assume 
that the inequality holds for t — 1, that is, |r t _! — q t \ | < £ Q (t ~ l)2 t_1 d, which is equivalent to 
|(r t _i - q^^Z) -1 ! < £ (t - l)2 t-1 l, where 1 is the all 1 vector. On the other hand, 

|r t - q t | = |[r t _ x M] 6 - q t | < |r t _ x M - q t | + &d = |(r t _! - q^jD-^D - A)\ + £ t d 

< 2*^o(*-l)2 t_1 d + ^ 2 t d 

where the second to last inequality follows from the induction hypothesis and the fact that for any 
vector p, if |p| < cl for some constant c, then for any vertex v, 

|p(D-A)(«)| = \J2p( u )( d vu - A vu )\ <Y,\P( u )\( D vu + A vu ) < 2cd{v). 

U U 

□ 

Note that the second part of Proposition[T]directly implies a lower bound on J(q t , x). More specifi- 
cally, we have the following corollary. 

Corollary 1. For any set U, \q t \{U) > \r t \(U) > \q t \(U) - ^2*vo/(f7). 

Now we give an upper bound on J(q t ,x). 

Lemma 5. For any vertex v, T > 0, © < 1, if for any t < T, the sweep sets Si{q t ) of volume at most 
K have bipartiteness at least 0, then for any < t < T and < x < 2m, 

T ,_ \ 2 i x rir/ e 2 \* 

Proof. We prove the lemma by combining the following observations and the proof of Lemma[3] 

First we note that for any t < T and x < 2m, J(r t , x) < J(q t , x). This follows by the definition 
of the potential function. More specifically, let w G [0, l] n be a vector that achieves J(r t , x), that is, 
E„w(u)rf(u) = x and J(r t ,x) = J2 V \ r t\(v)w(v). Then J(r tl x) < E v |q f l(«)w(u) < J{q t > x ) 
since for any v, |rt|(u) < |q t |(«). Furthermore, by the relation between q t and r t _iM, we can always 
guarantee that Si(q t ) = Si(r t -iM) for every i < n. 

Then by the conditions given in the lemma and the convergence Lemma|2] for x = \o\(Si(q t )), we 
have 

J(q t ,x) = J(r t -iM,x) < J(r t _i, x + Qx) + J(r t _i, x - Qx) 

< J(q t _ t ,x + Qx) + J(q t _ l5 x - 6a;) (6) 

Finally, we can use the same induction as in the proof of Lemma[3]to show that the lemma's statement 
holds. □ 
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Now we are ready to prove Theorem|2] 

Proof of Theorem^ We first show the correctness of the local algorithm and then bound its running 
time. 

• (Correctness.) As stated in the algorithm, we choose T = eln 6 g° k ■ Let Ug = U T C U be the 
subset as described in LemmaHl which has volume at least vol({/)/2. Now let v £ Ue and assume 
that in the algorithm LocDB(w, k, 8, e), for any t <T, all the sweep sets Si (q t ) of volume at most 
800fc 1+c have bipartiteness at least 9 = ^/48#/e, then by Lemma|5] we have 

J(q t ,wol(S))<J(q t ,k) < 



< 
< 

< 

where the last inequality follows from the fact that e < 1/2 and that Cq € > 800cq 1 + 1. 
On the other hand, by Lemma|4]and[T]and that £qT = c ° g 0Q — , we have 

\q T \(U) > \XvM T \\U\ - £ T2 T vol([/) > 2 T ( i ^(l - W) T - CoTfc) 

U0(T ' 800/ 

> 2 T ( J_ -elnc k _ 

V400 800 J 

2 r (c fc)- e 
800 ' 

which is a contradiction. Therefore, there exists at least one sweep set of volume at most 0(k 1+e ) 
and bipartiteness at most 0(y/6/e). 
• (Running time.) We first bound the time required in each iteration. For any t < T, instead 
of perform the dense vector multiplication to compute q t , we keep record of the support of rt, 
which has volume at most ||q t ||i/£t < 2* /(Co 2*) = S.Q 1 - By definition, both the volume and the 
computational time of q t+1 are proportional to vol(supp(r t )), which is at most by the property 
of truncation operation. 

During the sweep process, we only need to sweep the vertices in supp(r f ). Sorting these ver- 
tices requires time 0(|supp(r t )| In |supp(r ( )|) < 0(vol(supp(r t )) In vol(supp(r t ))). Computing 
the bipartiteness of the sweep sets requires time 0(vol(supp(r t ))). Therefore, in a single itera- 
tion, the computation takes time 0(vol(supp(r t )) + vol(supp(r 4 )) In vol(supp(r t ))) = O^ 1 + 

Since the algorithm takes T iterations, the total running time is thus bounded by O (T£ 1 In £ 1 ) = 
0(e 2 fc 1+e ln 3 fc/6> 2 ). 

□ 

5 Spectral Characterization of the Small Dense Bipartite-like Sub- 
graphs 

Recall that = Aq < Ai < • • • A„_i are the eigenvalues of its (normalized) Laplacian (and also M). 



V800fc e k 1 / 2 ) 
T {cok)- e 
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Theorem 3. If \ n -k > 2 — 2r], then there is a polynomial time algorithm such that for any < e < 1, 
it finds a subset (X, Y) of volume at most 0(vol(G) /k 1 ~ e ) and bipartiteness 0(y/l6(r]/e) log fc n). 

Proof. Given k, r\, e, we set T = K = Q^j.'i-e , and run the step 2 and 3 of the algorithm SwpDB 

to find a subgraph, which clearly runs in polynomial time. Assume that during this process, all the sweep 
sets Si(x«M*) of volume at most K have bipartiteness 6 = y/l6(r)/e) log fe n, for any v E V and t <T. 
Then, by Lemma [3] we have that for any v E V, 

XvM T x T v < J( Xv M T ,d(v)) < + (2 - 

Therefore, 



< 
< 

On the other hand, by the trace formula, 

J2 XvM T X T v = Tr(M T ) = J2 A * > fc ( 2 - 2 vf = 2 T k(l - rj)^T > 2 T k 1 ~\ 

v£V i=l 

which is a contradiction. □ 
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